The study of data-parallel domain re-organization and thread-mappingtechniques are relevant topics as they can increase the efficiency of GPUcomputations when working on spatial discrete domains with non-box-shapedgeometry. In this work we study the potential benefits of applying a succintdata re-organization of a tetrahedral data-parallel domain of size$\mathcal{O}(n^3)$ combined with an efficient block-space GPU map of the form$g:\mathbb{N} \rightarrow \mathbb{N}^3$. Results from the analysis suggest thatin theory the combination of these two optimizations produce significantperformance improvement as block-based data re-organization allows a coalescedone-to-one correspondence at local thread-space while $g(\lambda)$ produces anefficient block-space spatial correspondence between groups of data and groupsof threads, reducing the number of unnecessary threads from $O(n^3)$ to$O(n^2\rho^3)$ where $\rho$ is the linear block-size and typically $\rho^3 \lln$. From the analysis, we obtained that a block based succint datare-organization can provide up to $2\times$ improved performance over a lineardata organization while the map can be up to $6\times$ more efficient than abounding box approach. The results from this work can serve as a useful guidefor a more efficient GPU computation on tetrahedral domains found in spinlattice, finite element and special n-body problems, among others.
展开▼